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Abstract: We propose an information-theoretic alternative to the popular Cronbach alpha coefficient 
of reliability. Particularly suitable for contexts in which instruments are scored on a strictly nonnumeric 
scale, our proposed index is based on functions of the entropy of the distributions of defined on the 
sample space of responses. Our reliability index tracks the Cronbach alpha coefficient uniformly while 
offering several other advantages discussed in great details in this paper. 
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1. Introduction 


Suppose that we are given a dataset represented by an n x p matrix X whose zth row xj = (x-n . x ,2 , • • • , x, p ) 
denotes the p-tuple of characteristics, with each x, ; £ {1,2,3,4,5} representing the Likert-type level (order) of 
preference of respondent i on item j. This Likert-type score is obtained by translating/mapping the response 
levels {Strong Disagree, Disagree, Neutral, Agree, Strongly Agree} into pseudo-numbers {1,2, 3, 4, 5}. 
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A usually crucial part in the analysis of questionnaire data is the calculation of Cronbach’s alpha coefficient 
which measures the internal consistency or reliability/quality of the data. Let X = (Xi, X2, ■ ■ ■ ,X p ) T be 
a p-tuple representing the p items of a questionnaire. Initially proposed by Cronbach (1951) and later used 
and re-explained extensively by thousands of researchers and practitioners like Bland and Altman (1997) 
Cronbach’s alpha coefficient is a function of the ratio of the sum of the idiosyncratic item variances over the 
variance of the sum of the items, and is given by 



Ej=i W) 
v(ELi^) 


The coefficient of Cronbach a will be 1 if the items are all the same and 0 if none is related to another. Because 
it is depend on the variance of the sum of a group of independent variables and the sum of their variances. If 
the variables are positively correlated, the variance of the sum will be increased. If the items making up the 
score are all identical and so perfectly correlated, all the Y(Xj) will be equal and V (EEi^V = p 2 V(X ? ), 


so that 


E? =1 v(*j) = 1 
v(£?= ,x«) p 


and a = 1. 
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The empirical version of Cronbach’s alpha coefficient of internal consistency is given by 

2 


P 


P~ 1 


1 - 



E E x -r EE : 


*=1 \j=l 


i=l 3=1 


Definition 1. Let 2? = {xi,X 2 ,-- - , x„} 6e a dataset with x.J = (xji. x,; 2 , • • • ,Xi p ). An observation vector 
Xj will be called a zero variation vector if xyj = constant, j = 1, • • • ,p. Respondents with zero variation 
response vectors will be referred to as single minded respondents/evaluators. 

In fact, zero variation responses essentially reduce a p items survey to a single item survey. 

Theorem 1. Let X = {X-y , X 2 , ■ ■ ■ ,X p ) T be a p-tuple representing the p items of a questionnaire. If X is 
zero variation, then the Cronbach’s alpha coefficient will be equal to 1. 

Proof. If X = (Xy,X 2 , ■■■ , X P ) T is zero variation , then Xj = W for j = 1, ■ ■ • ,p , and X]j=i Xj = pW. As a 
result, Yfj =1 V(Xj) = pV(W) and Y Xj'j = V(pW) = p 2 V(W). Therefore, 

pV(W) 


a = 


p-1 


1 - 


p 2 V(W) 


p-1 


1 - 


= 1 


□ 


We use a straightforward adaptation of the Cronbach’s alpha coefficient to measure respondent reliability. 

Definition 2. Let T> = {xi, X 2 , • • • , x„} be a dataset with xj = (xji, xy 2 , • • • , x lp ). Let the estimated variance 
of the ith respondent be Sf = Y^ij =1 ( x *j ~~ x * ) 2 /(p~ !)■ Let Zj = £r=i x d represent the sum of the scores 
given by all the n respondents to item j. Our respondent reliability is estimated by 


a = 


n — 1 


1- 


n V ( V \ 

EE x w-E x ‘i 

<=ij=i \ i =1 / 


pin 


p n 


E E x >r EE : 


3 = 1 \*=i 


3=1 »=1 


Given a data matrix X , respondent reliability can be computed in practice by simply taking the Cronbach’s 
alpha coefficient of X T , the transpose of the data matrix X. Let m be the number of nonzero variation. If 
and m/n is very small, then respondent reliability will be very poor. 

Despite its widespread use of Likert-type data since it creation, Cronbach’s alpha coefficient is rigorously 
speaking not suitable for categorical data for the simple reason that averages on ordinal measurements are 
often difficult to interpret at best and misleading at worst. For many years researchers working on the clus¬ 
tering of Likert-type inappropriately resorted to average-driven methods like kMeans clustering. Fortunately, 
there has been a surge of contributions to the clustering of categorical data whereby appropriate methods 
have been used. At the heart of the clustering of categorical data is the need to define appropriate measure 
of similarity. Recognizing the possibility to preprocess Likert-type questionnaire data into a collection of es¬ 
timate probability distributions over the sample spaces of responses, many authors have developed powerful, 
scalable and highly techniques for clustering categorical data, most of them based on information-theoretic 
Cover and Thomas (1991) concepts like entropy Huang (1998), Guha et al. (2000), Barbara et al. (2002), 
San et al. (2004), Li et al. (2004), Chen and Liu (2005), Li (2006), Meila (2007), Cai et al. (2007), mutual 
information, variation of information Meila (2003), along with many other distances and measures on prob¬ 
ability distributions like the Bhattacharya distance Bhattacharya (1943), Mak (1996), Choi and Lee (2003), 
Goudail et al. (2004), You (2009), Reyes-Aldasoro and Bhalerao (2006), the Kullback-Leibler divergence and 
the Hellinger distance just to name a few. In this paper, we use information-theoretic tools and concepts to 
create several measures of internal consistency of questionnaire data. 
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2. Information-Theoretic Measures of Internal Data Consistency 

Let Xj represent one of the questions on the questionnaire, and consider the n responses, {xy, • • • , x^-, • • • , x n j} 
provided by the n evaluators. Let Vj = (vji, • • • ,Vjk , ,Vjn) T denote the vector containing the relative 
frequencies of each Likert level for question j. With a total of n questionnaires collected, we have 

1 " 

Vjfc = — J(xjj = k), k = 1,2,-•• ,K and j = 1, 2, • • • ,p. (2.1) 

n i=l 

Using (2.1), one can then form probabilistic vectors vj = (v,i, • • • ,Vjk, • • • , Vjk), for j = 1, 2, • • • ,p. Each 
vector Vj essentially represents an approximate probability distribution on the sample space made up of the 
K response levels. Using this probabilistic representation of each question j, we can compare the variability 
of each item of the questionnaire using the entropy, specifically 

K 

H (vj) = lo S2(vjfc) (2.2) 

fc=l 

We can imagine a transformation of the n x p data matrix X into a probabilistic p x K counterpart V 
where each row represent the approximate probability distribution of the corresponding question (item). The 
entropy of each question indicates the variability of the answers given by students on that question. For a 
given course and a given instructor, a small value of this entropy would indicate a greater degree of agreement 
of his/her student on that item, and therefore suggest a more careful examination of the scores on that item. 
As far as the relationship between items is concerned, information theory also provides a wealth of measures. 
The symmetrized Kullback-Leibler divergence given by 

KL 2 (vj, Vj) = i|KL( Vi ,Vj) +KL(vj, Vi )} = | v *fc lo g log j, 


where 


KL(a 


k 

Xj) = J2 Vik 

k =1 


log (— 

\Vjk 


and KL(vj,v») = 


S v '‘ log te)' 


is usually the default measure used by most authors. The Kullback-Leibler divergence is closely related the 
mutual information 



Vik,jl log 2 


/ Vik,jl \ 
\VikVjl) 


which has been used extensively in machine learning to define a distance known as the Variation of Infor¬ 
mation, and defined by 

VI(Vi,v,,) = H(vi) + H(vj) - 2/(vj, Vj). 


Many other non-information-theoretic similarity and variation measures operating on probabilistic vectors can 
be used to further investigate several aspects of the categorical data at hand. One that have been extensively 
used in the machine learning and data mining community is the Bhattacharya distance Bhattacharya (1943) 
is given by 

BC(vi,Vj) = -logF (vj,Vj), 


F ( v i,Vj) = J^^/VikVjk, 
k=l 

is known as the Bhattacharya coefficient or Fidelity coefficient. The Bhattacharya distance BC(v,, Vj) measures 
the overlap between v,; and Vj. The Bhattacharya distance has been immensely used in various data mining 
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and machine learning applications Mak (1996), Choi and Lee (2003), Goudail et al. (2004), You (2009). It is 
interesting to note that the Bhattachrya distance is related to total variation measure defined by 


1 K 1 

A(vj,Vj) = -^|v ifc -v jfe | = d|vi - Vj-||i 


k= 1 


where || • ||i is the £\ norm. Another very commonly used distance is the Hellinger distance between v,; and 
Vj is given by 


Hellinger(vj, Vj) = 


1 


K 


J2 = -ill Vx~ 


k =1 


V2 


where || • ||2 is the Euclidean norm or £2 norm, y/vf = (y/vu , ■ • • , y/vix) and y/vj = (y/vJT, ■ ■ ■ , y/Vjx)- 

Definition 3. Let Q denote an instrument (questionnaire) for which the realized matrix of obtained responses 
is given by X with entries xy £ {1, 2, • • • , K}. We propose an information-theoretic measure of the reliability 
of Q, referred to as the information consistency ratio of Q and given by 


<p = 1 - 


min 

i=1 


in \H(zi)\ min \ H (z,) ) min {iL(zi)) 


c{iJ(z)} 


= 1 - 




= 1 - 


log 2 (AT) 


(2.3) 


where each Zj = {zA-, fc = 1, 2, • • • , K} defines an approximate probability distribution on the sample space of 
possible responses, and H(-) is the entropy function, with 


K 


^ik 


c = -£j(x« = *) and H( %) = - ^ z ik \og 2 (z ik ). 


(2.4) 


3 =1 


fc=i 


Lemma 1. Let z denote any probability measure defined on some K-dimensional sample space, with each 
z k = Pr {E k }, k = 1.2, • • • , K. Let H(-) denote the entropy function, such that for every z, we have H( z) = 


-J2k=i z k^og 2 ('z k ). Then 


c{iJ (z) | = log 2 (/i). 


Proof. Since entropy essentially measures uncertainty (disturbance), the probability measure for which the 
uncertainty is the largest is the probability measure z* in which all the events are equally likely, i.e., z£ = 
Pr {E k } = i, k= 1.2,-■■ , K. 


,{h(z)]=H(z*) 


= H 


1 1 
K’"' ’ ~K 


= (i) =1 °g 2(K). 


fe=1 


□ 

Proposition 1. Let Qq denote a special questionnaire whose items are all mutually independent (unrelated). 
Then the corresponding information consistency ratio ipo of Qq, is such that 

lim ipo = 0. 

p—too 


Proof. With Q 0 denoting a questionnaire whose items that are all mutually independent (unrelated), the ma¬ 
trix of realized responses has entries xy that a realization of the discrete uniform distribution on {1, 2, • • ■ , K }, 
or specifically, xy ~ unitorm(l, 2, • • • , K). It follows that for each i = 1,2, • • ■ , n, we must have 

lim z,i k = lim < — 

p —>00 p—too I p 


1 (X’4 = fc ) \ = 
3 =1 


1 


fc = 1,2,- ■ 


, A'. 
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In other words, given enough questions (items), the empirical proportion of answers will converge to its 
theoretical counterpart by the law of large number. We therefore have the uniform generation of answers, the 


limiting distribution 


lim zi = z* 

p—yoo 



Finally, since all the response distributions will tend to converge to the same maximal measure z*, i.e. z* —> z*, 
for i = 1, 2, • • • , n, we must have 


mm 

2 = 1 , 


JyH (z i) | -4 H(z*) = max{l7 (z) j, 


and therefore 


lim ipo = 1 — 

p—yoo 


min 

2 = 1 ,••• ,n 



max 

z 



H(z*) 
H{ z*) 


= 1-1 = 0 . 


□ 


Proposition 2. LetQ + denote a special questionnaire whose items are all identical. Then the corresponding 
information consistency ratio of Q+, is such that lim<£>_|_ = 1. 


Proof. With Q + denoting a questionnaire whose items that are all identical, the matrix of realized responses 
has entries x,;, = c, for some constant c £ {1, 2, • • • , K\. Then for each i = 1, 2, • • • , n, there exists k + £ 
{1, 2, • • • , K} such that 


^ik — 


1 k = k+ 
0 k ^ k+ 


In other words, with Q + , the approximate distributions z i of the answers of each respondent are of the form 
(1, 0, • • • , 0), or (0,1, • • • , 0) or (0,0, • • • , 1). Therefore, for Q + , we must have H(zi ) =0, i = 1, • • ■ , n, with 

the result being min < H (z i) f = 0, and therefore 
2=1,••• ,n l J 


<p+ = i- 


mm 

2 = 1 ,•••, 


,{*«>} 


( z )} 


= i - 


H( z*) 


= 1-0 = 1 . 


□ 


Definition 4. Let Yi represent the most frequently occurring answer in respondent i’s vector of p answers. 
It is easy to see that Yi has the same sample space as each question/item, namely the same Likert scale in 
our case. Using the random variables Yi, we can then define w = (wi, • • • ,w j., • ■ ■ , wr-) t in the same manner 
that we define Vj earlier. More specifically, we have 


Yi = argmax 
fc=!,■■■ ,K 


v 1 { n 

■ I{Xij = k) > and wj, = — y /(Y) = k). 


i=i 


i= 1 


The entropy of w is given by 


K 

H{ w) = -^w fc log 2 (w fc ). 
k= 1 


(2.5) 


( 2 . 6 ) 


The random variable Yi is maximal in a set-theoretic sense, and and can be thought of as the categorical 
analogue of the sum of numeric Xj’s. Using w, an alternative definition of the information consistency ratio 
p is 


V = i - 


min \ H (%) i 

= 1,--- ,nl J 

H (w) 


(2.7) 
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Percentage of perfectly identical items 


Fig 1. Comparative curves of ip and Cronbach alpha as measures of internal consistency. 


An even more stringent measure of the information consistency ratio is given by 

max \ H (Zj) \ 

i= ,nl ) 


<p = i - 


H (w) 


( 2 . 8 ) 


3. Demonstration of Properties of ip 

We use a simple simulation setup to empirically compare the different measures presented in this paper. We 
set p = 50 and n = 1000 and we vary the ratio of perfectly reliable components from 10% to 100% by 10%. 

For i = 1, • • • ,n and j = 1, • ■ ■ ,n, draw the x^’s uniformly with replacement from {1, 2, • • • , K}, that is, 

Draw Xij ~ unif orm(l, 2, • • • , K). 

Randomly replace 100c% of the columns of X with the same column of constant values, where c £ {0.1,0.2, • • • , 0.9,1}. 
Table (1) shows the simulated values of the information consistency ratio and Cronbach’s alpha coefficient for 
different fractions of of reliable components in the instrument. Figure (1) is a direct pictorial representation 
of the numbers from Table (1), and we can see that the Cronbach alpha coefficient is less strick than the 
information consistency ratio. 
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0.230 
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0.440 
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0.000 

0.000 

0.020 
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0.520 
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1.000 


0.230 

0.270 

0.330 

0.440 

0.520 

0.630 

0.740 

0.900 

1.000 

1.000 

(f4 

0.000 

0.000 

0.020 

0.080 

0.140 

0.240 

0.360 

0.520 

0.720 

1.000 

Cronbach 

0.380 

0.700 

0.820 

0.910 

0.940 

0.960 

0.980 

0.990 

1.000 

1.000 


Table 1 

Simulated values of the information consistency ratio and Cronbach’s alpha coefficient for different fractions of reliable 

components in the instrument. 
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4. Conclusion and Discussion 

We have proposed and developed an information-theoretic measure of internal data consistency et demon¬ 
strated via straightforward simulation that it does indeed capture the amount of information potentially 
contained in the data for,the purposes of performing all kinds of pattern for the data. We have also provided 
several many other measures of similarity over probabilistic vectors that we intend to use for further refined 
our proposed information consistency ratio p. We intend to conduct a larger simulation study to establish our 
proposed measure on a stronger footing. We also plan to compare the predictive power of ICR to Cronbach’s 
alpha coefficient on various real and simulated data. 
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